SqueezCL: Squeezing OpenCL Kernels for Approximate Computing on Contemporary GPUs

نویسندگان

Atieh Lotfi

Abbas Rahimi

Hadi Esmaeilzadeh

Rajesh K. Gupta

چکیده

Approximate computing provides an opportunity for exploiting application characteristics to improve performance of computing systems. However, such opportunity must be balanced against generality of methods and quality guarantees that the system designer can provide to the application developer. Improved parallel processing in graphics processing units (GPUs) provides one such means for data-level parallel applications. We propose SqueezCL a software method to reduce the hardware resources used by an OpenCL kernel. SqueezCL transforms an exact OpenCL kernel to an approximate OpenCL kernel by squeezing dimensions of its data elements. The core of SqueezCL leverages bitwidth reduction to shrink the hardware resources. Selectively reducing the precision and size of data elements generates approximate kernels that can be executed faster at a cost to quality loss. Exploiting this opportunity is particularly important for GPU accelerators that are inherently subject to memory resource constraints. We evaluate SqueezCL on a diverse set of data-level parallel OpenCL benchmarks from the AMD APP SDK v2.9. Experimental result on the AMD Radeon HD 5870 shows that SqueezCL yields on average 1.1× higher performance with less than 10% quality loss without requiring any changes to the underlying GPU hardware.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Iterative statistical kernels on contemporary GPUs

We present a study of three important kernels that occur frequently in iterative statistical applications: Multi-Dimensional Scaling (MDS), PageRank, and K-Means. We implemented each kernel using OpenCL and evaluated their performance on NVIDIA Tesla and NVIDIA Fermi GPGPU cards using dedicated hardware, and in the case of Fermi, also on the Amazon EC2 cloud-computing environment. By examining ...

متن کامل

From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming

In this work, we evaluate OpenCL as a programming tool for developing performanceportable applications for GPGPU. While the Khronos group developed OpenCL with programming portability in mind, performance is not necessarily portable. OpenCL has required performance-impacting initializations that do not exist in other languages such as CUDA. Understanding these implications allows us to provide ...

متن کامل

Cooperative Kernels: GPU Multitasking for Blocking Algorithms (Extended Version)

There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploiting scheduling quirks of today’s GPUs in a manner that does not allow the G...

متن کامل

Optimizing OpenCL Kernels for Iterative Statistical Applications on GPUs

We present a study of three important kernels that occur frequently in iterative statistical applications: K-Means, MultiDimensional Scaling (MDS), and PageRank. We implemented each kernel using OpenCL and evaluated their performance on an NVIDIA Tesla GPGPU card. By examining the underlying algorithms and empirically measuring the performance of various components of the kernel we explored the...

متن کامل

Directive-Based Compilers for GPUs

General Purpose Graphics Computing Units can be effectively used for enhancing the performance of many contemporary scientific applications. However, programming GPUs using machine-specific notations like CUDA or OpenCL can be complex and time consuming. In addition, the resulting programs are typically fine-tuned for a particular target device. A promising alternative is to program in a conven...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

SqueezCL: Squeezing OpenCL Kernels for Approximate Computing on Contemporary GPUs

نویسندگان

چکیده

منابع مشابه

Iterative statistical kernels on contemporary GPUs

From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming

Cooperative Kernels: GPU Multitasking for Blocking Algorithms (Extended Version)

Optimizing OpenCL Kernels for Iterative Statistical Applications on GPUs

Directive-Based Compilers for GPUs

عنوان ژورنال:

اشتراک گذاری